Skip to main content

Viability Test Flow

The viability test allows you to evaluate a set of URLs before scraping them, to automatically determine which scraping strategy to use for each one.

The complete flow has two steps:

  1. Submit the URLs for analysis and obtain a run_id.
  2. Query the run_id until the analysis is complete and read the recommended strategy.

1. Submit the URLs for analysis

Send the list of URLs you want to evaluate. The response is immediate and returns a run_id for tracking.

Endpoint

POST /v1/async/viability-test

Request Body Example

{
"urls": [
"https://www.example-static.com",
"https://www.booking.com",
"https://www.example-captcha.com"
]
}

Expected Response

{
"run_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"status": "in_progress",
"total_urls": 3,
"completed_urls": 0
}

Save the run_id -- it is needed to query the results in the next step.


2. Query the results

The analysis runs in the background. You need to poll periodically until status is completed.

Endpoint

GET /v1/async/viability-test/{run_id}

Response while analysis is in progress

{
"run_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"status": "in_progress",
"total_urls": 3,
"completed_urls": 1,
"results": null
}

Response when analysis is complete

{
"run_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"status": "completed",
"total_urls": 3,
"completed_urls": 3,
"results": [
{
"url": "https://www.example-static.com",
"recommended_strategy": "extract_html",
...
},
{
"url": "https://www.booking.com",
"recommended_strategy": "browser",
...
},
{
"url": "https://www.example-captcha.com",
"recommended_strategy": "blocked",
...
}
]
}

3. Interpret the results and scrape

Each URL in results includes a recommended_strategy. Below is a description of what each one means and how to proceed with scraping.


extract_html -- Static HTML available

The site serves its content directly in the HTTP response, without requiring JavaScript. No blocks were detected.

What to do: scrape without browser. This is the fastest option with the lowest resource consumption.

{
"url": "https://www.example-static.com",
"recommended_strategy": "extract_html",
"javascript_required": false,
"captcha_detected": false,
"cloudflare_level": "none",
"can_use_extract_html": true,
"can_use_browser": true
}

browser -- The site requires JavaScript

The main content is rendered via JavaScript. A simple HTTP request would return empty or incomplete HTML. The site does not present active blocks.

What to do: scrape with browser enabled. The browser will execute the JS and wait for the content to become available.

{
"url": "https://www.booking.com",
"recommended_strategy": "browser",
"javascript_required": true,
"browser_confidence": 0.92,
"captcha_detected": false,
"can_use_extract_html": false,
"can_use_browser": true
}

api -- Data endpoints detected

During the browser analysis, JSON or XHR endpoints were detected that expose the data directly, without requiring authentication. The endpoints are listed in api_endpoints.

What to do: scrape without browser, pointing directly to the endpoints listed in api_endpoints. This is the most efficient strategy when available.

{
"url": "https://www.example-booking.com",
"recommended_strategy": "api",
"api_detected": true,
"api_endpoints": [
"https://www.example-booking.com/avl",
"https://www.example-booking.com/api/search"
],
"api_auth_required": false,
"can_use_simple_request": true
}

blocked -- The site cannot be scraped under normal conditions

One or more active barriers were detected: captcha, Cloudflare challenge, or login wall. Scraping is not possible without additional intervention.

What to do: review the captcha_providers, cloudflare_level, and login_wall fields to understand which specific barrier was encountered. This may require the use of residential proxies, captcha solving, or manual analysis of the authentication flow.

{
"url": "https://www.example-captcha.com",
"recommended_strategy": "blocked",
"captcha_detected": true,
"captcha_providers": ["cloudflare"],
"cloudflare_level": "challenge",
"can_use_extract_html": false,
"can_use_browser": false
}

Summary

The viability test flow always follows the same two-step pattern:

  1. Submit the URLs -> obtain run_id.
  2. Query the run_id -> read recommended_strategy per URL and scrape accordingly.
StrategyNeeds browserWhen
extract_htmlNoStatic HTML, no JS or blocks
browserYesContent rendered with JS
apiNoJSON/XHR endpoints detected and accessible
blocked--Active captcha, Cloudflare challenge, or login wall